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RELATED APPLICATIONS 

[0001] This application claims priority to U.S. Provisional Applications 60/290,029, 
60/290,645, 60/292,336, 60/295,798, 60/297,457, 60/298,884 and 60/303,459, and is a 
continuation-in-part of U.S. Application No. 09/917,800, filed July 31, 2001, now pending, 
all of which are herein incorporated by reference in their entirety. This application is also 
related to U.S. Provisional Applications 60/222,040 and 60/244,880, both of which are also 
herein incorporated by reference in their entirety. 

SEQUENCE LISTING SUBMISSION ON COMPACT DISC 

[0002] The Sequence Listing submitted concurrently herewith on compact disc under 37 
C.F.R. §§1 .821(c) and 1.821(e) is herein incorporated by reference in its entirety. Three 
copies of the Sequence Listing, one on each of three compact discs are provided. Copy 1 and 
Copy 2 are identical. Copies 1 and 2 are also identical to the CRF. Each electronic copy of 
the Sequence Listing was created on January 30, 2002 with a file size of 3083 KB. The file 
names are as follows: Copy 1- GL5038Ul.txt; Copy 2- GL5038Ul.txt; CRF- GL5038Ul.txt. 

BACKGROUND OF THE INVENTION 

[0003] The need for methods of assessing the toxic impact of a compound, pharmaceutical 
agent or environmental pollutant on a cell or living organism has led to the development of 
procedures which utilize living organisms as biological monitors. The simplest and most 
convenient of these systems utilize unicellular microorganisms such as yeast and bacteria, 
since they are most easily maintained and manipulated. Unicellular screening systems also 
often use easily detectable changes in phenotype to monitor the effect of test compounds on 
the cell. Unicellular organisms, however, are inadequate models for estimating the potential 
effects of many compounds on complex multicellular animals, as they do not have the ability 
to carry out biotransformations to the extent or at levels found in higher organisms. 
[0004] The biotransformation of chemical compounds by multicellular organisms is a 
significant factor in determining the overall toxicity of agents to which they are exposed. 
Accordingly, multicellular screening systems may be preferred or required to detect the toxic 
effects of compounds. The use of multicellular organisms as toxicology screening tools has 
been significantly hampered, however, by the lack of convenient screening mechanisms or 
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endpoints, such as those available in yeast or bacterial systems. In addition, previous 
attempts to produce toxicology prediction systems have failed to provide the necessary 
modeling data and statistical information to accurately predict toxic responses (e.g., WO 
00/12760, WO 00/47761, WO 00/63435, WO 01/32928, WO 01/38579, and the Affymetrix® 
Rat Tox Chip. 

SUMMARY OF THE INVENTION 

[0005] The present invention is based on the elucidation of the global changes in gene 
expression in tissues or cells exposed to known toxins, in particular hepatotoxins, as 
compared to unexposed tissues or cells as well as the identification of individual genes that 
are differentially expressed upon toxin exposure. 

[0006] In various aspects, the invention includes methods of predicting at least one toxic 
effect of a compound, predicting the progression of a toxic effect of a compound, and 
predicting the hepatoxicity of a compound. The invention also includes methods of 
identifying agents that modulate the onset or progression of a toxic response. Also provided 
are methods of predicting the cellular pathways that a compound modulates in a cell. The 
invention includes methods of identifying agents that modulate protein activities. 
[0007] In a further aspect, the invention provides probes comprising sequences that 
specifically hybridize to genes in Tables 1-3. Also provided are solid supports comprising at 
least two of the previously mentioned probes. The invention also includes a computer system 
that has a database containing information identifying the expression level in a tissue or cell 
sample exposed to a hepatotoxin of a set of genes comprising at least two genes in Tables 1- 
3. 

DETAILED DESCRIPTION 

[0008] Many biological functions are accomplished by altering the expression of various 
genes through transcriptional (e.g. through control of initiation, provision of RNA precursors, 
RNA processing, etc.) and/or translational control. For example, fundamental biological 
processes such as cell cycle, cell differentiation and cell death are often characterized by the 
variations in the expression levels of groups of genes. 

[0009] Changes in gene expression are also associated with the effects of various 
chemicals, drugs, toxins, pharmaceutical agents and pollutants on an organism or cells. For 
example, the lack of sufficient expression of functional tumor suppressor genes and/or the 
over expression of oncogene/protooncogenes after exposure to an agent could lead to 
tumorgenesis or hyperplastic growth of cells (Marshall, Cell, 64: 313-326 (1991); Weinberg, 
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Science, 254:1 138-1 146 (1991)). Thus, changes in the expression levels of particular genes 
{e.g. oncogenes or tumor suppressors) may serve as signposts for the presence and 
progression of toxicity or other cellular responses to exposure to a particular compound. 
[0010] Monitoring changes in gene expression may also provide certain advantages during 
drug screening and development. Often drugs are screened for the ability to interact with a 
major target without regard to other effects the drugs have on cells. These cellular effects 
may cause toxicity in the whole animal, which prevents the development and clinical use of 
the potential drug. 

[0011] The present inventors have examined tissue from animals exposed to the known 
hepatotoxins which induce detrimental liver effects, to identify global changes in gene 
expression induced by these compounds. These global changes in gene expression, which 
can be detected by the production of expression profiles, provide useful toxicity markers that 
can be used to monitor toxicity and/or toxicity progression by a test compound. Some of 
these markers may also be used to monitor or detect various disease or physiological states, 
disease progression, drug efficacy and drug metabolism. 

Identification of Toxicity Markers 

[0012] To evaluate and identify gene expression changes that are predictive of toxicity, 
studies using selected compounds with well characterized toxicity have been conducted by 
the present inventors to catalogue altered gene expression during exposure in vivo and in 
vitro. In the present study, acyclovir, amitryptiline, alpha-naphthylisothiocyante (ANIT), 
acetaminophen, AY-25329, bicalutamide, carbon tetrachloride, clofibrate, cyproterone 
acetate (CPA), diclofenac, diflunisal, dioxin, 1 7a-ethinylestradiol, hydrazine, indomethacin, 
lipopoly saccharide, phenobarbital, tacrine, valproate, WY- 14643 and zileuton were selected 
as a known hepatotoxins. 

[0013] The pathogenesis of acute CC1 4 - induced hepatotoxicity follows a well- 
characterized course in humans and experimental animals resulting in centrilobular necrosis 
and steatosis, followed by hepatic regeneration and tissue repair. Severity of the 
hepatocellular injury is also dose-dependent and may be affected by species, age, gender and 
diet. 

[0014] Differences in susceptibility to CC1 4 hepatotoxicity are primarily related to the 
ability of the animal model to metabolize CCI4 to reactive intermediates. CCl 4 -induced 
hepatotoxicity is dependent on CC1 4 bioactivation to trichloromethyl free radicals by 
cytochrome P450 enzymes (CYP2E1), localized primarily in centrizonal hepatocytes. 
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Formation of the free radicals leads to membrane lipid peroxidation and protein denaturation 
resulting in hepatocellular damage or death. 

[0015] The onset of hepatic injury is rapid following acute administration of CC1 4 to male 
rats. Morphologic studies have shown cytoplasmic accumulation of lipids in hepatocytes 
within 1 to 3 hours of dosing, and by 5 to 6 hours, focal necrosis and hydropic swelling of 
hepatocytes are evident. Centrilobular necrosis and inflammatory infiltration peak by 24 to 
48 hours post dose. The onset of recovery is also evident within this time frame by increased 
DNA synthesis and the appearance of mitotic figures. Removal of necrotic debris begins by 
48 hours and is usually completed by one week, with full restoration of the liver by 14 days. 
[0016] Increases in serum transaminase levels also parallel CCl 4 -induced hepatic 
histopathology. In male Sprague Dawley (SD) rats, alanine aminotrasferase (ALT) and 
aspartate aminotransferase (AST) levels increase within 3 hours of CC1 4 administration (0.1, 
1,2, 3, 4 mL/kg, ip; 2.5 mL/kg, po) and reach peak levels (approximately 5-10 fold increases) 
within 48 hours post dose. Significant increases in serum a-glutathione s-transferase (oc- 
GST) levels have also been detected as early as 2 hours after CC1 4 administration (25 uL/kg, 
po) to male SD rats. 

[0017] At the molecular level, induction of the growth-related proto-oncogenes, c-fos and 
c-jun, is reportedly the earliest event detected in an acute model of CC1 4 -induced 
hepatotoxicity (Schiaffonato et al, Liver 17:183-191 (1997)). Expression of these early- 
immediate response genes has been detected within 30 minutes of a single dose of CC1 4 to 
mice (0.05 -1.5 mL/kg, ip) and by 1 to 2 hours post dose in rats (2 mL/kg, po; 5 mL/kg, po) 
(Schiaffonato et al, supra, and Hong et al, Yonsei MedicalJ '38:167 '-177 (1997)). Similarly, 
hepatic c-myc gene expression is increased by 1 hour following an acute dose of CC1 4 to male 
SD rats (5 mL/kg, po) (Hong et al, supra). Expression of these genes following exposure to 
CC1 4 is rapid and transient. Peak hepatic mRNA levels for c-fos, c-jun, and c-myc, after 
acute administration of CC1 4 have been reported at 1 to 2 hours, 3 hours, and 1 hour post 
dose, respectively. 

[0018] The expression of tumor necrosis factor-a (TNF-a) is also increased in the livers of 
rodents exposed to CC1 4 , and TNF-a has been implicated in initiation of the hepatic repair 
process. Pre-treatment with anti-TNF-a antibodies has been shown to prevent CCl 4 -mediated 
increases in c-jun and c-fos gene expression, whereas administration of TNF-a induced rapid 
expression of these genes (Bruccoleri et al, Hepatol 25:133-141 (1997)). Up-regulation of 
transforming growth factor- p (TGF-p) and transforming growth factor receptors (TBRI-III) 
later in the repair process (24 and 48 hours after CC1 4 administration) suggests that TGF-p 
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may play a role in limiting the regenerative response by induction of apoptosis (Grasl-Kraupp 
et al, Hepatol 28:717-7126 (1998)). 

[0019] Acetaminophen is a widely used analgesic that at supratherapeutic doses can be 
metabolized to 7V-acetyl-p>-benzoquinone imine (NAPQI) which causes hepatic and renal 
failure. At the molecular level, until the present invention little was known about the effects 
of acetominophen. 

[0020] Amitriptyline is a commonly used antidepressant, although it is recognized to have 
toxic effects on the liver {Physicians Desk Reference, 47 th ed., Medical Economics Co., Inc., 
1993; Balkin, U.S. Patent No. 5,656,284) . Nevertheless, amitriptyline' s beneficial effects on 
depression, as well as on sleep and dyspepsia (Mertz et al, Am J Gastroenterol 93(2): 160- 
165 (1998)), migraines (Beubler, WienMed Wochenschr 144(5-6): 100-101 (1994)), arterial 
hypertension (Bobkiewicz et al., Arch Immunol Ther Exp (Warsz) 23(4):543-547 (1975)) and 
premature ejaculation (Smith et al, U.S. Patent No. 5,923,341) mandate its continued use. 
[0021 ] Differences in susceptibility to amitriptyline toxicity are considered related to 
differential metabolism. Amitriptyline-induced hepatotoxicity is primarily mediated by 
development of cholestasis, the condition caused by the failure of the liver to secrete bile, 
resulting in accumulation in blood plasma of substances normally secreted into bile-bilirubin 
and bile salts. Cholestasis is also characterized by liver cell necrosis and bile duct 
obstruction, which leads to increased pressure on the lumenal side of the canalicular 
membrane and release of enzymes (alkaline phosphatase, 5 '-nucleotidase, gammaglutamyl 
transpeptidase) normally localized on the canalicular membrane. These enzymes also begin 
to accumulate in the plasma. Typical symptoms of cholestasis are general malaise, weakness, 
nausea, anorexia and severe pruritis (Cecil Textbook of Medicine, 20 th ed., part XII, pp. 772- 
773, 805-808, J. C. Bennett and F. Plum Eds., W. B. Saunders Co., Philadelphia, 1996). 
[0022] The effects of amitriptyline or phenobarbital (PB) on phospholipid metabolism in rat 
liver have been studied. In one study, male Sprague-Dawley rats received amitriptyline 
orally in one dose of 600 mg/kg. PB was given intraperitonially (IP) at a dosage of 80 
mg/kg. Animals were sacrificed by decapitation at 6, 12, 18, and 24 nr. The phospholipid 
level in liver was measured by enzymatic assay and by gas chromatography-mass 
spectrometry. Both agents caused an increase in the microsomal phosphatidylcholine 
content. Levels of glycerophosphate acyltransferase (GAT) and phosphatidate 
cytidylyltransferase (PCT) were slightly affected by amitriptyline but were significantly 
affected by PB. Levels of phosphatidate phosphohydrolase (PPH) and choline 
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phosphotransferase (CPT) were significantly altered by amitriptyline and by PB (Hoshi et al, 
Chem Pharm Bull 38:3446-3448 (1990)). 

[0023] In another experiment, amitriptyline was given orally to male Sprague-Dawley rats 
(4-5 weeks old) in a single dose of 600 mg/kg. The animals were sacrificed 12 or 24 hours 
later. This caused a marked increase in -aminolevulinic acid ( -ALA) activity at both time 
points. Total heme and cytochrome b5 levels were increased but cytochrome P450 (CYP450) 
content remained the same. The authors concluded that hepatic heme synthesis is increased 
through prolonged induction of -ALA but this may be accounted for by the increases in 
cytochrome b5 and total heme and not by the CYP450 content (Hoshi et al, Jpn J Pharmacol 
50:289-293 (1989)). 

[0024] Amitriptyline can cause hypersensitivity syndrome, a specific severe idiosyncratic 
reaction characterized by skin, liver, joint and haematological abnormalities (Milionis et al, 
Postgrad Med 76(896):361-363 (2000)). Amitriptyline has also been shown to cause drug- 
induced hepatitis, resulting in liver peroxisomes with impaired catalase function (De 
Creaemer et al, Hepatology 14(5):81 1-817 (1991)). The peroxisomes are larger in number, 
but smaller in size and deformed in shape. Using cultured hepatocytes, the cytotoxicity of 
amitriptyline was examined and compared to other psychotropic drugs (Boelsterli et al, Cell 
Biol Toxicol 3(3):231-250 (1987)). The effects observed were release of lactate 
dehydrogenase from the cytosol, as well as impairment of biosynthesis and secretion of 
proteins, bile acids and glycolipids. 

[0025] Aromatic and aliphatic isothiocyanates are commonly used soil fumigants and 
pesticides (Shaaya et al, Pesticide Science 44(3):249-253 (1995); Cairns et al, J Assoc 
Official Analytical Chemists 71(3):547-550 (1988)). These compounds are also 
environmental hazards, however, because they remain as toxic residues in plants, either in 
their original or in a metabolized form (Cerny et al, J Agricultural and Food Chemistry 
44(12):3835-3839 (1996)) and because they are released from the soil into the surrounding 
air (Gan et al, JAgricutural and Food Chemistry 46(3):986-990 (1998)). Alpha- 
naphthylthiourea, an amino-substituted form of ANIT, is a known rodenticide whose 
principal toxic effects are pulmonary edema and pleural effusion, resulting from the action of 
this compound on pulmonary capillaries. Microsomes from lung and liver release atomic 
sulfur (Goodman and Gilman's The Pharmacological Basis of Therapeutics, 9 th ed., chapter 
67, p. 1690, J. G. Hardman et al. eds., McGraw-Hill, New York, NY, 1996). 
[0026] In one study in rats, ANIT (80 mg/kg) was dissolved in olive oil and given orally to 
male Wistar rats (180-320g). All animals were fasted for 24 hours before ANIT treatment, 
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and blood and bile excretion were analyzed 24 hours later. Levels of total bilirubin, alkaline 
phosphatase, serum glutamic oxaloacetic transaminase and serum glutamic pyruvic 
transaminase were found to be significantly increased, while ANIT reduced total bile flow, 
all of which are indications of severe biliary dysfunction. This model is used to induce 
cholestasis with jaundice because the injury is reproducible and dose-dependent. ANIT is 
metabolized by microsomal enzymes, and a metabolite plays a fundamental role in its toxicity 
(Tanaka et al , Clinical and Experimental Pharmacology and Physiology 20:543-547 
(1993))(92). 

[0027] ANIT fails to produce extensive necrosis, but has been found to produce 
inflammation and edema in the portal tract of the liver (Maziasa et al, Toxicol Appl 
Pharmacol 1 10:365-373 (1991)). Livers treated with ANIT are significantly heavier than 
control-treated counterparts and serum levels of alanine aminotransferase (ALT), gamma- 
glutamyl transpeptidase ( -GTP), total bilirubin, lipid peroxide and total bile acids showed 
significant increases (Anonymous, Toxicol Lett 105:103-110 (2000)). 

[0028] ANIT-induced hepatotoxicity may also be characterized by cholangiolitic hepatitis 
and bile duct damage. Acute hepatotoxicity caused by ANIT in rats is manifested as 
neutrophil-dependent necrosis of bile duct epithelial cells (BDECs) and hepatic parenchymal 
cells. These changes mirror the cholangiolitic hepatitis found in humans (Hill, Toxicol Sci 
47:118-125 (1999)). 

[0029] Exposure to ANIT also causes liver injury by the development of cholestasis, the 
condition caused by failure to secrete bile, resulting in accumulation in blood plasma of 
substances normally secreted into bile, such as bilirubin and bile salts. Cholestasis is also 
characterized by liver cell necrosis, including bile duct epithelial cell necrosis, and bile duct 
obstruction, which leads to increased pressure on the lumenal side of the canalicular 
membrane, decreased canalicular flow and release of enzymes normally localized on the 
canalicular membrane (alkaline phosphatase, 5 '-nucleotidase, gammaglutamyl 
transpeptidase). These enzymes also begin to accumulate in the plasma. Typical symptoms 
of cholestasis are general malaise, weakness, nausea, anorexia and severe pruritis (Cecil 
Textbook of Medicine, 20 th ed., part XII, pp. 772-773, 805-808, J. C. Bennett and F. Plum 
Eds., W. B. Saunders Co., Philadelphia (1996) and Kossor et al, Toxicol Appl Pharmacol 
119:108-114(1993)). 

[0030] ANIT-induced cholestatis is also characterized by abnormal serum levels of alanine 
aminotransferase, aspartic acid aminotransferase and total bilirubin. In addition, hepatic lipid 
peroxidation is increased, and the membrane fluidity of microsomes is decreased. 
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Histological changes include an infiltration of polymorphonuclear neutrophils and elevated 
number of apoptotic hepatocytes (Calvo et al, J CellBiochem 80(4):461-470 (2001)). Other 
known hepatotoxic effects of exposure to ANIT include a damaged antioxidant defense 
system, decreased activities of superoxide dismutase and catalase (Ohta et al, Toxicology 
139(3):265-275 (1999)), and the release of several proteases from the infiltrated neutrophils, 
alanine aminotransferase, cathepsin G, elastase, which mediate hepatocyte killing (Hill et al, 
Toxicol Appl Pharmacol 148(1):169-175 (1998)). 

[0031] Indomethacin is a non-steroidal antiinflammatory, antipyretic and analgesic drug 
commonly used to treat rheumatoid arthritis, osteoarthritis, ankylosing spondylitis, gout and a 
type of severe, chronic cluster headache characterized by many daily occurrences and jabbing 
pain. This drug acts as a potent inhibitor of prostaglandin synthesis; it inhibits the 
cyclooxygenase enzyme necessary for the conversion of arachidonic acid to prostaglandins 
(PDR 47 th ed., Medical Economics Co., Inc., Montvale, NJ, 1993; Goodman & Gilman's The 
Pharmalogical Basis of Therapeutics 9 th ed., J.G. Hardman et al. eds., McGraw Hill, New 
York, 1996, pp. 1074-1075, 1089-1095; Cecil Textbook of Medicine, 20 th ed., part XII, pp. 
772-773, 805-808, J. C. Bennett and F. Plum Eds., W. B. Saunders Co., Philadelphia, 1996). 
[0032] The most frequent adverse effects of indomethacin treatment are gastrointestinal 
disturbances, usually mild dyspepsia, although more severe conditions, such as bleeding, 
ulcers and perforations can occur. Hepatic involvement is uncommon, although some fatal 
cases of hepatitis and jaundice have been reported. Renal toxicity can also result, particularly 
after long-term administration. Renal papillary necrosis has been observed in rats, and 
interstitial nephritis with hematuria, proteinuria and nephrotic syndrome have been reported 
in humans. Patients suffering from renal dysfunction risk developing a reduction in renal 
blood flow, because renal prostaglandins play an important role in renal perfusion. 
[0033] In rats, although indomethacin produces more adverse effects in the gastrointestinal 
tract than in the liver, it has been shown to induce changes in hepatocytic cytochrome P450. 
In one study, no widespread changes in the liver were observed, but a mild, focal, 
centrilobular response was noted. Serum levels of albumin and total protein were 
significantly reduced, while the serum level of urea was increased. No changes in creatinine 
or aspartate aminotransferase (AST) levels were observed (Falzon et al, BrJexp Path 
66:527-534 (1985)). In another rat study, a single dose of indomethacin has been shown to 
reduce liver and renal microsomal enzymes, including CYP450, within 24 hours. 
Histopathological changes were not monitored, although there were lesions in the GI tract. 
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The effects on the liver seemed to be waning by 48 hours (Fracasso et al, Agents Actions 
31:313-316,(1990)). 

[0034] A study of hepatocytes, in which the relative toxicity of five nonsteroidal 
antiinflammatory agents was compared, showed that indomethacin was more toxic than the 
others. Levels of lactate dehydrogenase release and urea, as well as viability and 
morphology, were examined. Cells exposed to high levels of indomethacin showed cellular 
necrosis, nuclear pleomorphism, swollen mitochondria, fewer microvilli, smooth 
endoplasmic reticulum proliferation and cytoplasmic vacuolation (Sorensen et al.,J Toxicol 
Environ Health 16(3-4);425-440 (1985)). 

[0035] 17^-ethinylestradiol, a synthetic estrogen, is a component of oral contraceptives, 
often combined with the progestational compound norethindrone. It is also used in post- 
menopausal estrogen replacement therapy (PDR47 th ed., pp. 2415-2420, Medical Economics 
Co., Inc., Montvale, NJ, 1993; Goodman & Gilman's The Pharmalogical Basis of 
Therapeutics 9 th ed., pp. 1419-1422, J.G. Hardman et al. Eds., McGraw Hill, New York, 
1996). 

[0036] The most frequent adverse effects of 17<A-ethinylestradiol usage are increased risks 
of cardiovascular disease: myocardial infarction, thromboembolism, vascular disease and 
high blood pressure, and of changes in carbohydrate metabolism, in particular, glucose 
intolerance and impaired insulin secretion. There is also an increased risk of developing 
benign hepatic neoplasia, although the incidence of this disease is very low. Because this 
drug decreases the rate of liver metabolism, it is cleared slowly from the liver, and 
carcinogenic effects, such as tumor growth, may result. 

[0037] In a recent study, 17<*-ethinylestradiol was shown to cause a reversible intrahepatic 
cholestasis in male rats, mainly by reducing the bile-salt-independent fraction of bile flow 
(BSIF) (Koopen et al., Hepatology 27:537-545 (1998)). Plasma levels of bilirubin, bile salts, 
aspartate aminotransferase (AST) and alanine aminotransferase (ALT) in this study were not 
changed. This study also showed that 17 -ethinylestradiol produced a decrease in plasma 
cholesterol and plasma triglyceride levels, but an increase in the weight of the liver after 3 
days of drug administration, along with a decrease in bile flow. Further results from this 
study are as follows. The activities of the liver enzymes leucine aminopeptidase and alkaline 
phosphatase initially showed significant increases, but enzyme levels decreased after 3 days. 
Bilirubin output increased, although glutathione (GSH) output decreased. The increased 
secretion of bilirubin into the bile without affecting the plasma level suggests that the 
increased bilirubin production must be related to an increased degradation of heme from 
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heme-containing proteins. Similar results were obtained in another experiment (Bouchard et 
al, Liver 13:193-202 (1993)) in which the livers were also examined by light and electron 
microscopy. Despite the effects of the drug, visible changes in liver tissue were not observed. 
[0038] In another study of male rats, cholestasis was induced by daily subcutaneous 
injections of 1 7e(-ethinylestradiol for five days. Cholestasis was assessed by measuring the 
bile flow rate. Rats allowed to recover for five days after the end of drug treatment showed 
normal bile flow rates (Hamada et al, Hepatology 21:1455-1464 (1995)). 
[0039] An experiment with male and female rats (Mayol, Carcinogenesis 13:2381-2388 
(1992)) found that 17A-ethinylestradiol induced acute liver hyperplasia (increase in mitotic 
index and BrdU staining) after two days of treatment, although growth regression occurred 
within the first few days of treatment. With long-term treatment, lasting hyperplasia was 
again observed after three to six months of administration of the drug. Apoptosis increased 
around day 3 and returned to normal by one week. Additional experiments in this same study 
showed that proliferating hepatocytes were predominantly located around a periportal zone of 
vacuolated hepatocytes, which were also induced by the treatment. Chronic induced 
activation was characterized by flow cytometry on hepatocytes isolated from male rats, and 
ploidy analysis of hepatocyte cell suspensions showed a considerably increased proportion of 
diploid hepatocytes. These diploid cells were the most susceptible to drug-induced 
proliferation. The results from this study support the theory that cell target populations exist 
that respond to the effects of tumor promoters. The susceptibility of the diploid hepatocytes 
to proliferation during treatment may explain, at least in part, the behavior of 17 - 
ethinylestradiol as a tumor promoter in the liver. 

[0040] Wy- 14643, a tumor-inducing compound that acts in the liver, has been used to study 
the genetic profile of cells during the various stages of carcinogenic development, with a 
view toward developing strategies for detecting, diagnosing and treating cancers (Rockett et 
al, Toxicology 144(1-3): 13-29, (2000)). In contrast to other carcinogens, Wy-14643 does not 
mutate DNA directly. Instead, it acts on the peroxisome proliferator activated receptor-alpha 
(PPARalpha), as well as on other signaling pathways that regulate growth (Johnson et al, J 
Steroid Biochem MolBiol 77(1):59-71 (2001)). The effect is elevated and sustained cell 
replication, accompanied by a decrease in apoptosis (Rusyn et al, Carcinogenesis 
21(12):2141-2145 (2000)). These authors (Rusyn et al.) noted an increase in the expression 
of enzymes that repair DNA by base excision, but no increased expression of enzymes that do 
not repair oxidative damage to DNA. In a study on rodents, Johnson et al. noted that Wy- 
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14643 inhibited liver-X-receptor-mediated transcription in a dose-dependent manner, as well 
as de novo sterol synthesis. 

[0041] In experiments with mouse liver cells (Peters et al, Carcinogenesis 19(1 1):1989- 
1994 (1998), exposure to Wy- 14643 produced increased levels of acyl CoA oxidase and 
proteins involved in cell proliferation: CDK-1, 2 and 4, PCNA and c-myc. Elevated levels 
may be caused by accelerated transcription that is mediated directly or indirectly by 
PPARalpha. It is likely that the carcinogenic properties of peroxisome proliferators are due 
to the PPARalpha-dependent changes in levels of cell cycle regulatory proteins. 
[0042] Another study on rodents (Keller et al. , Biochim Biophys Acta 1 102(2):237-244 
(1992)) showed that Wy- 14643 was capable of uncoupling oxidative phosphorylation in rat 
liver mitochondria. Rates of urea synthesis from ammonia and bile flow, two energy- 
dependent processes, were reduced, indicating that the energy supply for these processes was 
disrupted as a result of cellular exposure to the toxin. 

[0043] Wy- 14643 has also been shown to activate nuclear factor kappaB, NADPH oxidase 
and superoxide production in Kupffer cells (Rusyn et al, Cancer Res 60(17):4798-4803 
(2000)). NADPH oxidase is known to induce mitogens, which cause proliferation of liver 
cells. 

[0044] CPA is a potent androgen antagonist and has been used to treat acne, male pattern 
baldness, precocious puberty, and prostatic hyperplasia and carcinoma (Goodman & 
Gilman's The Pharmacological Basis of Therapeutics 9 th ed., p. 1453, J.G. Hardman et al, 
Eds., McGraw Hill, New York, 1996). Additionally, CPA has been used clinically in 
hormone replacement therapy (HRT). CPA is useful in HRT as it protects the endometrium, 
decreases menopausal symptoms, and lessens osteoporotic fracture risk (Schneider, "The role 
of antiandrogens in hormone replacement therapy," Climacteric 3 (Suppl. 2): 21-27 (2000)). 
[0045] Although CPA has numerous clinical applications, it is tumorigenic, mitogenic, and 
mutagenic. CPA has been used to treat patients with adenocarcinoma of the prostate, 
however in two documented cases (Macdonald et al, Clin Oncol 13: 135-137 (2001)), 
patients developed femoral head avascular necrosis following CPA treatment. In one study 
(Krebs et al, Carcinogenesis 19(2): 241-245 (1998)), Big Blue transgenic F344 rats were 
giving varying doses of CPA. As the dose of CPA increased, so did the mutation frequency, 
but a threshold dose was not determined. Another study (Werner et al., Mutat Res 395(2-3): 
179-187 (1997)), showed that CPA caused the formation of DNA adducts in primary cultures 
of human hepatocytes. The authors suggest that the geno toxicity associated with CPA may 
be due to the double bond in position 6-7 of the steroid. 
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[0046] In additional experiments with rats (Kasper et al, Carcinogenesis 17(10): 2271- 
2274 (1996)), CPA was shown to induce unscheduled DNA synthesis in vitro. After a single 
oral dose of 100 mg CPA/kg body weight, continuous DNA repair activity was observed after 
16 hours. Furthermore, CPA increased the occurrence of S phase cells, which corroborated 
the mitogenic potential of CPA in rat liver. 

[0047] CPA has also been shown to produce cirrhosis (Garty et al, Eur J Pediatr 158(5): 
367-370 (1999)). A child, who had been treated with CPA for over 4 years for hypothalamic 
syndrome and precocious puberty, developed cirrhosis. Even though the medication was 
discontinued, the child eventually succumbed to sepsis and multiorgan failure four years 
later. 

[0048] In one study on rat liver treated with CPA (Bursch et al, Arch Toxicol 69(4): 253- 
258 (1995)), the expression of clusterin, a marker for apoptosis, was examined and measured 
by Northern and slot blot analysis. Bursch et al. showed that post-CPA administration, the 
clusterin mRNA concentration level increased. Moreover, in situ hybridization demonstrated 
that clusterin was expressed in all hepatocytes, therefore it is not limited to cells in the 
process of death by apoptosis. 

[0049] Diclofenac, a non-steroidal anti-inflammatory drug, has been frequently 
administered to patients suffering from rheumatoid arthritis, osteoarthritis, and ankylosing 
spondylitis. Following oral administration, diclofenac is rapidly absorbed and then 
metabolized in the liver by cytochrome P450 isozyme of the CYC2C subfamily (Goodman & 
Gilman's The Pharmacological Basis of Therapeutics 9 th ed., p. 637, J.G. Hardman et al., 
eds., McGraw Hill, New York, 1996). In addition, diclofenac has been applied topically to 
treat pain due to corneal damage (Jayamanne et ah, Eye 1 l(Pt. 1): 79-83 (1997); Dornic et 
al., "Topical diclofenac sodium in the management of anesthetic abuse keratopathy," Am J 
Ophthalmol 125(5): 719-721 (1998)). 

[0050] Although diclofenac has numerous clinical applications, adverse side-effects have 
been associated with the drug. In one study, out of 16 patients suffering from corneal 
complications associated with diclofenac use, 6 experienced corneal or scleral melts, three 
experienced ulceration, and two experienced severe keratopathy (Guidera et al, 
Ophthalmology 108(5): 936-944 (2001)). Another report described a term newborn who had 
premature closure of the ductus arteriosus as a result of maternal treatment with diclofenac 
(Zenker et al.,JPerinat Med 26(3): 231-234 (1998)). Although it was only two weeks prior 
to delivery, the newborn had severe pulmonary hypertension and required treatment for 22 
days of high doses of inhaled nitric oxide. 
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[0051] Another study investigated 180 cases of patients who had reported adverse reactions 
to diclofenac to the Food and Drug Administration (Banks et al., Hepatology 22(3): 820-827 
(1995)). Of the 180 reported cases, the most common symptom was jaundice (75% of the 
symptomatic patients). Liver sections were taken and analyzed, and hepatic injury was 
apparent one month after drug treatment. An additional report showed that a patient 
developed severe hepatitis five weeks after beginning diclofenac treatment for osteoarthritis 
(Bhogaraju et al, South MedJ92(7): 711-713 (1999)). Within a few months following the 
cessation of diclofenac treatment there was complete restoration of liver functions. 
[0052] In one study on diclofenac-treated Wistar rats (Ebong et al, Afr J Med Sci 27(3-4): 
243-246 (1998)), diclofenac treatment induced an increase in serum chemistry levels of 
alanine aminotransferase, aspartate aminotransferase, methaemoglobin, and total and 
conjugated bilirubin. Additionally, diclofenac enhanced the activity of alkaline phosphatase 
and 5 'nucleotidase. Another study showed that humans given diclofenac had elevated levels 
of hepatic transaminases and serum creatine when compared to the control group (McKenna 
etal, Scand J Rheumatol 30(1): 11-18 (2001)). 

[0053] The anti-hypertensive drug AY-25329 (Wyeth-Ayerst) exhibits nephrotoxicity in the 
proximal, and possibly distal, tubules of the kidney. Although no data on its effects in 
humans is publicly available, the inventors of the present invention have observed minor 
changes associated with liver necrosis in rats. Specifically, increased mitosis rates and 
decreased glycogen levels were seen in all rats examined, indicating some measure of toxic 
response. 

[0054] Bicalutamide is a non-steroidal anti-androgen that is a mixed-oxidase inducer. This 
drug causes liver enlargement. Its effects on the liver have been described in studies on rats 
and dogs, but have not been demonstrated in humans (Iswaran et al, J Toxicol Sci 22(2):75- 
88 (1997). Studies by the instant inventors have shown an increase in mitosis rates and a 
minor degree of hepatocellular hypertrophy in the rat. 

[0055] Clofibrate is a peroxisome proliferator that has also been reported to cause non- 
genotoxic carcinogenicity in rodent livers (Qu et al, Free Radic Biol Med 31(5):659-969 
(2001); Mochizuki et al, Carcinogenesis 3(9):1027 (1982)). This compound is also known to 
cause liver enlargement (IARC Geneva: World Health Organization, International Agency 
for Research on Cancer, 1972 -Present, p.V24 45 (1980); Fort et al, Toxicology 28(4):305 
(1983)). Studies by the present inventors show early increases in AST and ALT levels 
followed by dose-dependent hepatocellular alterations and increased mitotic activity. 
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[0056] Diflunisal is a non-steroidal anti-inflammatory drug that is thought to exhibit 
toxicity in humans, but not in rodent animal models. Its effects in rat hepatocytes, however, 
have been documented (Masubuchi et al, J Pharmacol Exp Ther 287(1):208-213 (1998)). In 
addition, as a class of compounds, NSAIDS are infamous for their toxic effects (Johnson et 
al, Drugs Aging 1(2):130-143 (1991)). 

[0057] Dioxin (2,3,7,8-tetrachlorodibenzo-p-dioxin) is known to cause hepatocellular 
carcinogenicity in rodent animal models (NTP; Bioassay of 2,3,7,8-Tetrachlorodibenzo-p- 
dioxin, p.v, DHHS Publication No (NTH) 80-1765 (1980)), although this effect is known to 
be specific to certain sensitive strains (Viluksela et al, Cancer Res 60(24):691 1-6920 (2000). 
This chemical also causes liver cancers in humans (IARC Geneva: World Health 
Organization, International Agency for Research on Cancer, 1972-Present, p. 69 342 (1997)). 
[0058] Hydrazine (Isoniazid) is a known liver carcinogen in the rodent and is also thought 
to cause steatosis (Waterfield et al.,Arch Toxicol 67(4):244-254 (1993); American 
Conference of Governmental Industrial Hygienists, Inc., 6th ed., vols. I-ffl, p. 761, ACGIH, 
Cincinnati, OH, 1991). It may be carcinogenic in humans as well, but the data in humans is 
not yet sufficient to be conclusive. Hydarzine's toxicity has also been documented in rat 
primary cultured hepatocytes (Ghatineh et al, Toxicology in Vitro 8(3):393-399 (1994)). 
[0059] Lipopolysaccharides are known endotoxins that induce inflammation (hepatitis) in 
the rat liver (Nolan, Hepatology l(5):458-65 (1981)). They have also been shown to induce 
cytotoxicity in primary cultured rat hepatocytes and in Kupffer cells (Hartung et al, Biochem 
Pharmacol 42(5) : 1 1 29- 1 1 3 5 (1 99 1 )). 

[0060] Phenobarbital is a barbiturate that is a known Cytochrome P450 inducer. Chronic 
dosing of this compound is known to induce non-genotoxic tumorigenesis (Whysner et al, 
Pharmacol Ther 71 (1 -2): 153-1 91 (1996)). 

[0061] Tacrine, a strong acetylcholinesterase (AChE) inhibitor, is used in the treatment of 
mild to moderate cases Alzheimer's dimentias. The effect seen in patients is a reversal of the 
cognitive and functional decline, but the drug does not appear to change the 
neurodegenerative process ( Goodman & Gilman's The Pharmacological Basis of 
Therapeutics. 9 th ed. . p. 174, Hardman et al, eds., McGraw Hill, New York, 1996). 
[0062] Hepatotoxicty caused by tacrine is typically reversible, although cases of severe 
hepatotoxicity have been seen. In one case study, a 75-year-old woman suffering from 
Alzheimer's disease had been administered tacrine for a period of 14 months (Blackard et al. 
J Clin Gastroenterol 26:57-59 (1998)). The woman developed progressive jaundice, 
followed by hepatic failure and death. 
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[0063] Preclinical studies failed to detect adverse hepatic events (Viau et al, Drug Chem 
Toxicol 16: 227-239 (1993)). While hepatotoxicity has been found in humans, in vivo rat 
studies have not shown a correlation between tacrine and hepatotoxicity, and the mechanism 
of action is not completely understood. In one in vitro study, tacrine displayed cytotoxicity to 
human hepatoma cell lines and primary rat hepatocytes (Viau et al, supra). Another in vitro 
study compared the reaction of human and rat liver microsomal preparations to tacrine 
(Woolf et al, DrugMetab Dispos 21 :874-882 (1993)). The study showed that the two 
species reacted differently to the drug, suggesting that the rat may not be the best model for 
monitoring tacrine-induced elevations in liver marker enzymes. 

[0064] While tacrine does not reveal classic signs of hepatotoxicity in rats, gene expression 
changes due to tacrine administration can be used to predict that the drug will be a liver toxin 
in humans. This suggests that toxicogenomics might be able to detect drugs that prove to be 
toxic in the clinic, even when classical but more crude measures in preclinical screening fail 
to detect toxicity. 

[0065] Valproate (valproic acid) is an anti-convulsant that causes fatty liver and necrosis in 
both humans and rodents (Eadie, Med Toxicol Adverse Drug Exp 3(2):85-106 (1988); Lewis, 
Hepatology 2(6):870-873, (1982)). This compound is also known to cause severe 
developmental defects (Briggs et al, A Reference Guide to Fetal and Neonatal Risk. Drugs 
in Pregnancy and Lactati on. 4th ed. . p. 869, Williams & Wilkins, Baltimore, MD 1994). 
[0066] Zileuton is thought to cause general inflammation (hepatitis) in the liver of humans. 
Its effects in rodents are minimal, with some observed cytochrome P450 induction and weak 
peroxisome proliferation (Rodrigues etal., Toxicol Appl Pharmacol 137(2): 193-201 (1996)). 
[0067] Acyclovir (9-[(2-hydroxyethyl) methyl] guanine, Zovirax®), an anti-viral guanosine 
analogue, is used to treat herpes simplex virus (HSV), varicella zoster virus (VZV) and 
Epstein-Barr virus (EBV) infections. It is phosphorylated by virally encoded thymidine 
kinase (TK) and converted to its activated di- and triphosphate forms by other kinases. Viral 
polymerases preferentially incorporate acyclovir, over natural bases, into viral DNA, but, 
because acyclovir is incorporated as a monophosphate, chain elongation is terminated. 
Acyclovir is not effective against viruses or viral mutants that lack TK (Fields Virology 3d 
ed., Fields et al., eds., pp. 436-440, Lippincott-Raven Publishers, Philadelphia, 1996; Cecil 
Textbook of Medicine, 20 th ed., part XII, p. 1742, J. C. Bennett and F. Plum Eds., W. B. 
Saunders Co., Philadelphia, 1996). 

[0068] The pharmacokinetics of acyclovir show that it has a half-life of about three hours 
and that most of it is excreted in the urine largely unchanged (Brigden et al., "The clinical 



Attorney Docket No. 4492 1-503 8-01 -US 
1-WA/l 742442.1 

16 

pharmacology of acyclovir and its prodrugs," Scand J Infect Dis Suppl 47:33-39, 1985). The 
most frequent adverse effect of acyclovir treatment is damage to various parts of the kidney, 
particularly the renal tubules, where the precipitation of crystals of acyclovir can occur 
(Fogazzi, "Crystalluria: a neglected aspect of urinary sediment analysis," Nephrol Dial 
Transplant 1 1(2):379-387, 1996). Although acyclovir is primarily a renal toxin, it has been 
shown to induce liver inflammation (hepatitis) ( Physicians' Desk Reference. 56 th ed. . p. 1707, 
Medical Economics Co. Inc., Montvale, NJ, 2002). Findings of hepatotoxicity in animals 
have not yet been published. 

Toxicity Prediction and Modeling 

[0069] The genes and gene expression information, as well as the portfolios and subsets of 
the genes provided in Tables 1-3, may be used to predict at least one toxic effect, including 
the hepatotoxicity of a test or unknown compound. As used, herein, at least one toxic effect 
includes, but is not limited to, a detrimental change in the physiological status of a cell or 
organism. The response may be, but is not required to be, associated with a particular 
pathology, such as tissue necrosis. Accordingly, the toxic effect includes effects at the 
molecular and cellular level. Hepatotoxicity is an effect as used herein and includes but is 
not limited to the pathologies of liver necrosis, hepatitis, fatty liver and protein adduct 
formation. As used herein, a gene expression profile comprises any quantitative 
representation of the expression of at least one mRNA species in a cell sample or population 
and includes profiles made by various methods such as differential display, PCR, 
hybridization analysis, etc. 

[0070] In general, assays to predict the toxicity or hepatotoxicity of a test agent (or 
compound or multi-component composition) comprise the steps of exposing a cell population 
to the test compound, assaying or measuring the level of relative or absolute gene expression 
of one or more of the genes in Tables 1-3 and comparing the identified expression level(s) to 
the expression levels disclosed in the Tables and database(s) disclosed herein. Assays may 
include the measurement of the expression levels of about 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 
30, 50, 75, 100 or more genes from Tables 1-3. 

[0071] In the methods of the invention, the gene expression level for a gene or genes 
induced by the test agent, compound or compositions may be comparable to the levels found 
in the Tables or databases disclosed herein if the expression level varies within a factor of 
about 2, about 1.5 or about 1.0 fold. In some cases, the expression levels are comparable if 
the agent induces a change in the expression of a gene in the same direction (e.g., up or 
down) as a reference toxin. 
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[0072] The cell population that is exposed to the test agent, compound or composition may 
be exposed in vitro or in vivo. For instance, cultured or freshly isolated hepatocytes, in 
particular rat hepatocytes, may be exposed to the agent under standard laboratory and cell 
culture conditions. In another assay format, in vivo exposure may be accomplished by 
administration of the agent to a living animal, for instance a laboratory rat. 
[0073] Procedures for designing and conducting toxicity tests in in vitro and in vivo systems 
are well known, and are described in many texts on the subject, such as Loomis et al, 
Loomis's Esstentials of Toxicology. 4th Ed. . Academic Press, New York, 1996; Echobichon, 
The Basics of Toxicity Testing . CRC Press, Boca Raton, 1992; Frazier, editor, In Vitro 
Toxicity Testing . Marcel Dekker, New York, 1992; and the like. 

[0074] In in vitro toxicity testing, two groups of test organisms are usually employed: One 
group serves as a control and the other group receives the test compound in a single dose (for 
acute toxicity tests) or a regimen of doses (for prolonged or chronic toxicity tests). Because, 
in some cases, the extraction of tissue as called for in the methods of the invention requires 
sacrificing the test animal, both the control group and the group receiving compound must be 
large enough to permit removal of animals for sampling tissues, if it is desired to observe the 
dynamics of gene expression through the duration of an experiment. 
[0075] In setting up a toxicity study, extensive guidance is provided in the literature for 
selecting the appropriate test organism for the compound being tested, route of 
administration, dose ranges, and the like. Water or physiological saline (0.9% NaCl in water) 
is the solute of choice for the test compound since these solvents permit administration by a 
variety of routes. When this is not possible because of solubility limitations, vegetable oils 
such as corn oil or organic solvents such as propylene glycol may be used. 
[0076] Regardless of the route of administration, the volume required to administer a given 
dose is limited by the size of the animal that is used. It is desirable to keep the volume of 
each dose uniform within and between groups of animals. When rats or mice are used, the 
volume administered by the oral route generally should not exceed about 0.005 ml per gram 
of animal. Even when aqueous or physiological saline solutions are used for parenteral 
injection the volumes that are tolerated are limited, although such solutions are ordinarily 
thought of as being innocuous. The intravenous LD 5 o of distilled water in the mouse is 
approximately 0.044 ml per gram and that of isotonic saline is 0.068 ml per gram of mouse. 
In some instances, the route of administration to the test animal should be the same as, or as 
similar as possible to, the route of administration of the compound to man for therapeutic 
purposes. 
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[0077] When a compound is to be administered by inhalation, special techniques for 
generating test atmospheres are necessary. The methods usually involve aerosolization or 
nebulization of fluids containing the compound. If the agent to be tested is a fluid that has an 
appreciable vapor pressure, it may be administered by passing air through the solution under 
controlled temperature conditions. Under these conditions, dose is estimated from the 
volume of air inhaled per unit time, the temperature of the solution, and the vapor pressure of 
the agent involved. Gases are metered from reservoirs. When particles of a solution are to be 
administered, unless the particle size is less than about 2 um the particles will not reach the 
terminal alveolar sacs in the lungs. A variety of apparatuses and chambers are available to 
perform studies for detecting effects of irritant or other toxic endpoints when they are 
administered by inhalation. The preferred method of administering an agent to animals is via 
the oral route, either by intubation or by incorporating the agent in the feed. 
[0078] When the agent is exposed to cells in vitro or in cell culture, the cell population to be 
exposed to the agent may be divided into two or more subpopulations, for instance, by 
dividing the population into two or more identical aliquots. In some preferred embodiments 
of the methods of the invention, the cells to be exposed to the agent are derived from liver 
tissue. For instance, cultured or freshly isolated rat hepatocytes may be used. 
[0079] The methods of the invention may be used generally to predict at least one toxic 
response, and, as described in the Examples, may be used to predict the likelihood that a 
compound or test agent will induce various specific liver pathologies, such as liver necrosis, 
fatty liver disease, protein adduct formation, hepatitis, or other pathologies associated with at 
least one of the toxins herein described. The methods of the invention may also be used to 
determine the similarity of a toxic response to one or more individual compounds. In 
addition, the methods of the invention may be used to predict or elucidate the potential 
cellular pathways influenced, induced or modulated by the compound or test agent due to the 
similarity of the expression profile compared to the profile induced by a known toxin (see 
Tables 3A-3DD). 

Diagnostic Uses for the Toxicity Markers 

[0080] As described above, the genes and gene expression information or portfolios of the 
genes with their expression information as provided in Tables 1-3 may be used as diagnostic 
markers for the prediction or identification of the physiological state of tissue or cell sample 
that has been exposed to a compound or to identify or predict the toxic effects of a compound 
or agent. For instance, a tissue sample such as a sample of peripheral blood cells or some 
other easily obtainable tissue sample may be assayed by any of the methods described above, 
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and the expression levels from a gene or genes from Tables 1-3 may be compared to the 
expression levels found in tissues or cells exposed to the toxins described herein. These 
methods may result in the diagnosis of a physiological state in the cell or may be used to 
identify the potential toxicity of a compound, for instance a new or unknown compound or 
agent. The comparison of expression data, as well as available sequence or other information 
may be done by researcher or diagnostician or may be done with the aid of a computer and 
databases as described below. 

[0081] In another format, the levels of a gene(s) of Tables 1-3, its encoded protein(s), or 
any metabolite produced by the encoded protein may be monitored or detected in a sample, 
such as a bodily tissue or fluid sample to identify or diagnose a physiological state of an 
organism. Such samples may include any tissue or fluid sample, including urine, blood and 
easily obtainable cells such as peripheral lymphocytes. 

Use of the Markers for Monitoring Toxicity Progression 

[0082] As described above, the genes and gene expression information provided in Tables 
1-3 may also be used as markers for the monitoring of toxicity progression, such as that 
found after initial exposure to a drug, drug candidate, toxin, pollutant, etc. For instance, a 
tissue or cell sample may be assayed by any of the methods described above, and the 
expression levels from a gene or genes from Tables 1-3 may be compared to the expression 
levels found in tissue or cells exposed to the hepatotoxins described herein. The comparison 
of the expression data, as well as available sequence or other information may be done by 
researcher or diagnostician or may be done with the aid of a computer and databases. 

Use of the Toxicity Markers for Drug Screening 

[0083] According to the present invention, the genes identified in Tables 1-3 may be used 
as markers or drug targets to evaluate the effects of a candidate drug, chemical compound or 
other agent on a cell or tissue sample. The genes may also be used as drug targets to screen 
for agents that modulate their expression and/or activity. In various formats, a candidate drug 
or agent can be screened for the ability to simulate the transcription or expression of a given 
marker or markers or to down-regulate or counteract the transcription or expression of a 
marker or markers. According to the present invention, one can also compare the specificity 
of a drug's effects by looking at the number of markers which the drug induces and 
comparing them. More specific drugs will have less transcriptional targets. Similar sets of 
markers identified for two drugs may indicate a similarity of effects. 
[0084] Assays to monitor the expression of a marker or markers as defined in Tables 1-3 
may utilize any available means of monitoring for changes in the expression level of the 
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nucleic acids of the invention. As used herein, an agent is said to modulate the expression of 
a nucleic acid of the invention if it is capable of up- or down-regulating expression of the 
nucleic acid in a cell. 

[0085] In one assay format, gene chips containing probes to one, two or more genes from 
Tables 1-3 may be used to directly monitor or detect changes in gene expression in the treated 
or exposed cell. Cell lines, tissues or other samples are first exposed to a test agent and in 
some instances, a known toxin, and the detected expression levels of one or more, or 
preferably 2 or more of the genes of Tables 1-3 are compared to the expression levels of 
those same genes exposed to a known toxin alone. Compounds that modulate the expression 
patterns of the known toxin(s) would be expected to modulate potential toxic physiological 
effects in vivo. The genes in Tables 1-3 are particularly appropriate marks in these assays as 
they are differentially expressed in cells upon exposure to a known hepatotoxin. 
[0086] In another format, cell lines that contain reporter gene fusions between the open 
reading frame and/or the transcriptional regulatory regions of a gene in Tables 1-3 and any 
assayable fusion partner may be prepared. Numerous assayable fusion partners are known 
and readily available including the firefly luciferase gene and the gene encoding 
chloramphenicol acetyltransferase (Alam et al, Anal Biochem 188:245-254 (1990)). Cell 
lines containing the reporter gene fusions are then exposed to the agent to be tested under 
appropriate conditions and time. Differential expression of the reporter gene between 
samples exposed to the agent and control samples identifies agents which modulate the 
expression of the nucleic acid. 

[0087] Additional assay formats may be used to monitor the ability of the agent to modulate 
the expression of a gene identified in Tables 1-3. For instance, as described above, mRNA 
expression may be monitored directly by hybridization of probes to the nucleic acids of the 
invention. Cell lines are exposed to the agent to be tested under appropriate conditions and 
time and total RNA or mRNA is isolated by standard procedures such those disclosed in 
Sambrook et al. ( Molecular Cloning: A Laboratory Manual, 2nd Ed., Cold Spring Harbor 
Laboratory Press, Cold Spring Harbor, NY, 1989). 

[0088] In another assay format, cells or cell lines are first identified which express the gene 
products of the invention physiologically. Cell and/or cell lines so identified would be 
expected to comprise the necessary cellular machinery such that the fidelity of modulation of 
the transcriptional apparatus is maintained with regard to exogenous contact of agent with 
appropriate surface transduction mechanisms and/or the cytosolic cascades. Further, such 
cells or cell lines may be transduced or transfected with an expression vehicle {e.g., a plasmid 
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or viral vector) construct comprising an operable non- translated 5 '-promoter containing end 
of the structural gene encoding the gene products of Tables 1-3 fused to one or more 
antigenic fragments or other detectable markers, which are peculiar to the instant gene 
products, wherein said fragments are under the transcriptional control of said promoter and 
are expressed as polypeptides whose molecular weight can be distinguished from the 
naturally occurring polypeptides or may further comprise an immunologically distinct or 
other detectable tag. Such a process is well known in the art (see Sambrook et al, supra). 
[0089] Cells or cell lines transduced or transfected as outlined above are then contacted 
with agents under appropriate conditions; for example, the agent comprises a 
pharmaceutically acceptable excipient and is contacted with cells comprised in an aqueous 
physiological buffer such as phosphate buffered saline (PBS) at physiological pH, Eagles 
balanced salt solution (BSS) at physiological pH, PBS or BSS comprising serum or 
conditioned media comprising PBS or BSS and/or serum incubated at 37 C. Said conditions 
may be modulated as deemed necessary by one of skill in the art. Subsequent to contacting 
the cells with the agent, said cells are disrupted and the polypeptides of the lysate are 
fractionated such that a polypeptide fraction is pooled and contacted with an antibody to be 
further processed by immunological assay {e.g. , ELISA, immunoprecipitation or Western 
blot). The pool of proteins isolated from the agent-contacted sample is then compared with 
the control samples (no exposure and exposure to a known toxin) where only the excipient is 
contacted with the cells and an increase or decrease in the immunologically generated signal 
from the agent- contacted sample compared to the control is used to distinguish the 
effectiveness and/or toxic effects of the agent. 

[0090] Another embodiment of the present invention provides methods for identifying 
agents that modulate at least one activity of a protein(s) encoded by the genes in Tables 1-3. 
Such methods or assays may utilize any means of monitoring or detecting the desired 
activity. 

[0091 ] In one format, the relative amounts of a protein (Tables 1-3) between a cell 
population that has been exposed to the agent to be tested compared to an un-exposed control 
cell population and a cell population exposed to a known toxin may be assayed. In this 
format, probes such as specific antibodies are used to monitor the differential expression of 
the protein in the different cell populations. Cell lines or populations are exposed to the agent 
to be tested under appropriate conditions and time. Cellular lysates may be prepared from the 
exposed cell line or population and a control, unexposed cell line or population. The cellular 
lysates are then analyzed with the probe, such as a specific antibody. 
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[0092] Agents that are assayed in the above methods can be randomly selected or rationally 
selected or designed. As used herein, an agent is said to be randomly selected when the agent 
is chosen randomly without considering the specific sequences involved in the association of 
the a protein of the invention alone or with its associated substrates, binding partners, etc. An 
example of randomly selected agents is the use a chemical library or a peptide combinatorial 
library, or a growth broth of an organism. 

[0093] As used herein, an agent is said to be rationally selected or designed when the agent 
is chosen on a nonrandom basis which takes into account the sequence of the target site 
and/or its conformation in connection with the agent s action. Agents can be rationally 
selected or rationally designed by utilizing the peptide sequences that make up these sites. 
For example, a rationally selected peptide agent can be a peptide whose amino acid sequence 
is identical to or a derivative of any functional consensus site. 

[0094] The agents of the present invention can be, as examples, peptides, small molecules, 
vitamin derivatives, as well as carbohydrates. Dominant negative proteins, DNAs encoding 
these proteins, antibodies to these proteins, peptide fragments of these proteins or mimics of 
these proteins may be introduced into cells to affect function. "Mimic" used herein refers to 
the modification of a region or several regions of a peptide molecule to provide a structure 
chemically different from the parent peptide but topographically and functionally similar to 
the parent peptide (see G.A. Grant in: Molecular Biology and Biotechnology . Meyers, ed., 
pp. 659-664, VCH Publishers, New York, 1995). A skilled artisan can readily recognize that 
there is no limit as to the structural nature of the agents of the present invention. 

Nucleic Acid Assay Formats 

[0095] The genes identified as being differentially expressed upon exposure to a known 
hepatotoxin (Tables 1-3) may be used in a variety of nucleic acid detection assays to detect or 
quantititate the expression level of a gene or multiple genes in a given sample. The genes 
described in Tables 1-3 may also be used in combination with one or more additional genes 
whose differential expression is associate with toxicity in a cell or tissue. In preferred 
embodiments, the genes in Tables 1-3 may be combined with one or more of the genes 
described in prior and related applications 60/222,040, 60/244,880, 60/290,029, 60/290,645, 
60/292,336, 60/295,798, 60/297,457, 60/298,884, 60/303,459 and 09/917,800, all of which 
are incorporated by reference on page 1 of this application. 

[0096] Any assay format to detect gene expression may be used. For example, traditional 
Northern blotting, dot or slot blot, nuclease protection, primer directed amplification, RT- 
PCR, semi- or quantitative PCR, branched-chain DNA and differential display methods may 
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be used for detecting gene expression levels. Those methods are useful for some 
embodiments of the invention. In cases where smaller numbers of genes are detected, 
amplification based assays may be most efficient. Methods and assays of the invention, 
however, may be most efficiently designed with hybridization-based methods for detecting 
the expression of a large number of genes. 

[0097] Any hybridization assay format may be used, including solution-based and solid 
support-based assay formats. Solid supports containing oligonucleotide probes for 
differentially expressed genes of the invention can be filters, polyvinyl chloride dishes, 
particles, beads, microparticles or silicon or glass based chips, etc. Such chips, wafers and 
hybridization methods are widely available, for example, those disclosed by Beattie (WO 
95/11755). 

[0098] Any solid surface to which oligonucleotides can be bound, either directly or 
indirectly, either covalently or non-covalently, can be used. A preferred solid support is a 
high density array or DNA chip. These contain a particular oligonucleotide probe in a 
predetermined location on the array. Each predetermined location may contain more than 
one molecule of the probe, but each molecule within the predetermined location has an 
identical sequence. Such predetermined locations are termed features. There may be, for 
example, from 2, 10, 100, 1000 to 10,000, 100,000 or 400,000 or more of such features on a 
single solid support. The solid support, or the area within which the probes are attached may 
be on the order of about a square centimeter. Probes corresponding to the genes of Tables 1- 
3 or from the related applications described above may be attached to single or multiple solid 
support structures, e.g., the probes may be attached to a single chip or to multiple chips to 
comprise a chip set. 

[0099] Oligonucleotide probe arrays for expression monitoring can be made and used 
according to any techniques known in the art (see for example, Lockhart et al., Nat 
Biotechnol 14:1675-1680 (1996); McGall et al, Proc Nat Acad Sci USA 93:13555-13460 
(1 996)). Such probe arrays may contain at least two or more oligonucleotides that are 
complementary to or hybridize to two or more of the genes described in Tables 1-3. For 
instance, such arrays may contain oligonucleotides that are complementary or hybridize to at 
least 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 50, 70, 100 or more the genes described herein. 
Preferred arrays contain all or nearly all of the genes listed in Tables 1-3, or individually, the 
gene sets of Tables 3A-3DD. In a preferred embodiment, arrays are constructed that contain 
oligonucleotides to detect all or nearly all of the genes in any one of or all of Tables 1-3 on a 
single solid support substrate, such as a chip. 
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[0100] The sequences of the expression marker genes of Tables 1-3 are in the public 
databases. Table 1 provides the GenBank Accession Number for each of the sequences (see 
www.ncbi.nlm.nih.gov/). The sequences of the genes in GenBank are expressly herein 
incorporated by reference in their entirety as of the filing date of this application, as are 
related sequences, for instance, sequences from the same gene of different lengths, variant 
sequences, polymorphic sequences, genomic sequences of the genes and related sequences 
from different species, including the human counterparts, where appropriate. These 
sequences may be used in the methods of the invention or may be used to produce the probes 
and arrays of the invention. In some embodiments, the genes in Tables 1-3 that correspond to 
the genes or fragments previously associated with a toxic response may be excluded from the 
Tables. 

[0101] As described above, in addition to the sequences of the GenBank Accessions 
Numbers disclosed in the Tables 1-3 , sequences such as naturally occurring variant or 
polymorphic sequences may be used in the methods and compositions of the invention. For 
instance, expression levels of various allelic or homologous forms of a gene disclosed in the 
Tables 1-3 may be assayed. Any and all nucleotide variations that do not alter the functional 
activity of a gene listed in the Tables 1-3 , including all naturally occurring allelic variants of 
the genes herein disclosed, may be used in the methods and to make the compositions (e.g. , 
arrays) of the invention. 

[0102] Probes based on the sequences of the genes described above may be prepared by any 
commonly available method. Oligonucleotide probes for screening or assaying a tissue or 
cell sample are preferably of sufficient length to specifically hybridize only to appropriate, 
complementary genes or transcripts. Typically the oligonucleotide probes will be at least 
about 10, 12, 14, 16, 18, 20 or 25 nucleotides in length. In some cases, longer probes of at 
least 30, 40, or 50 nucleotides will be desirable. 

[0103] As used herein, oligonucleotide sequences that are complementary to one or more of 
the genes described in Tables 1-3 refer to oligonucleotides that are capable of hybridizing 
under stringent conditions to at least part of the nucleotide sequences of said genes. Such 
hybridizable oligonucleotides will typically exhibit at least about 75% sequence identity at 
the nucleotide level to said genes, preferably about 80% or 85% sequence identity or more 
preferably about 90% or 95% or more sequence identity to said genes. 
[0104] "Bind(s) substantially" refers to complementary hybridization between a probe 
nucleic acid and a target nucleic acid and embraces minor mismatches that can be 
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accommodated by reducing the stringency of the hybridization media to achieve the desired 
detection of the target polynucleotide sequence. 

[0105] The terms "background" or "background signal intensity" refer to hybridization 
signals resulting from non-specific binding, or other interactions, between the labeled target 
nucleic acids and components of the oligonucleotide array (e.g., the oligonucleotide probes, 
control probes, the array substrate, etc.). Background signals may also be produced by 
intrinsic fluorescence of the array components themselves. A single background signal can 
be calculated for the entire array, or a different background signal may be calculated for each 
target nucleic acid. In a preferred embodiment, background is calculated as the average 
hybridization signal intensity for the lowest 5% to 10% of the probes in the array, or, where a 
different background signal is calculated for each target gene, for the lowest 5% to 10% of 
the probes for each gene. Of course, one of skill in the art will appreciate that where the 
probes to a particular gene hybridize well and thus appear to be specifically binding to a 
target sequence, they should not be used in a background signal calculation. Alternatively, 
background may be calculated as the average hybridization signal intensity produced by 
hybridization to probes that are not complementary to any sequence found in the sample (e.g. 
probes directed to nucleic acids of the opposite sense or to genes not found in the sample 
such as bacterial genes where the sample is mammalian nucleic acids). Background can also 
be calculated as the average signal intensity produced by regions of the array that lack any 
probes at all. 

[0106] The phrase "hybridizing specifically to" refers to the binding, duplexing, or 
hybridizing of a molecule substantially to or only to a particular nucleotide sequence or 
sequences under stringent conditions when that sequence is present in a complex mixture 
(e.g., total cellular) DNA or RNA. 

[0107] Assays and methods of the invention may utilize available formats to simultaneously 
screen at least about 100, preferably about 1000, more preferably about 10,000 and most 
preferably about 1,000,000 different nucleic acid hybridizations. 

[0108] As used herein a "probe" is denned as a nucleic acid, capable of binding to a target 
nucleic acid of complementary sequence through one or more types of chemical bonds, 
usually through complementary base pairing, usually through hydrogen bond formation. As 
used herein, a probe may include natural (i.e., A, G, U, C, or T) or modified bases (7- 
deazaguanosine, inosine, etc.). In addition, the bases in probes may be joined by a linkage 
other than a phosphodiester bond, so long as it does not interfere with hybridization. Thus, 
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probes may be peptide nucleic acids in which the constituent bases are joined by peptide 
bonds rather than phosphodiester linkages. 

[0109] The term "perfect match probe" refers to a probe that has a sequence that is perfectly 
complementary to a particular target sequence. The test probe is typically perfectly 
complementary to a portion (subsequence) of the target sequence. The perfect match (PM) 
probe can be a "test probe", a "normalization control" probe, an expression level control 
probe and the like. A perfect match control or perfect match probe is, however, distinguished 
from a "mismatch control" or "mismatch probe." 

[0110] The terms "mismatch control" or "mismatch probe" refer to a probe whose sequence 
is deliberately selected not to be perfectly complementary to a particular target sequence. For 
each mismatch (MM) control in a high-density array there typically exists a corresponding 
perfect match (PM) probe that is perfectly complementary to the same particular target 
sequence. The mismatch may comprise one or more bases. 

[Oil 1] While the mismatch(s) may be located anywhere in the mismatch probe, terminal 
mismatches are less desirable as a terminal mismatch is less likely to prevent hybridization of 
the target sequence. In a particularly preferred embodiment, the mismatch is located at or 
near the center of the probe such that the mismatch is most likely to destabilize the duplex 
with the target sequence under the test hybridization conditions. 
[0112] The term "stringent conditions" refers to conditions under which a probe will 
hybridize to its target subsequence, but with only insubstantial hybridization to other 
sequences or to other sequences such that the difference may be identified. Stringent 
conditions are sequence-dependent and will be different in different circumstances. Longer 
sequences hybridize specifically at higher temperatures. Generally, stringent conditions are 
selected to be about 5°C lower than the thermal melting point (Tm) for the specific sequence 
at a defined ionic strength and pH. 

[0113] Typically, stringent conditions will be those in which the salt concentration is at 
least about 0.01 to 1.0 M Na + ion concentration (or other salts) at pH 7.0 to 8.3 and the 
temperature is at least about 30°C for short probes (e.g., 10 to 50 nucleotides). Stringent 
conditions may also be achieved with the addition of destabilizing agents such as formamide. 
[0114] The "percentage of sequence identity" or "sequence identity" is determined by 
comparing two optimally aligned sequences or subsequences over a comparison window or 
span, wherein the portion of the polynucleotide sequence in the comparison window may 
optionally comprise additions or deletions (i.e., gaps) as compared to the reference sequence 
(which does not comprise additions or deletions) for optimal alignment of the two sequences. 
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The percentage is calculated by determining the number of positions at which the identical 
submit (e.g. nucleic acid base or amino acid residue) occurs in both sequences to yield the 
number of matched positions, dividing the number of matched positions by the total number 
of positions in the window of comparison and multiplying the result by 100 to yield the 
percentage of sequence identity. Percentage sequence identity when calculated using the 
programs GAP or BESTFIT (see below) is calculated using default gap weights. 

Probe design 

[01 15] One of skill in the art will appreciate that an enormous number of array designs are 
suitable for the practice of this invention. The high density array will typically include a 
number of test probes that specifically hybridize to the sequences of interest. Probes may be 
produced from any region of the genes identified in the Tables and the attached representative 
sequence listing. In instances where the gene reference in the Tables is an EST, probes may 
be designed from that sequence or from other regions of the corresponding full-length 
transcript that may be available in any of the sequence databases, such as those herein 
described. See WO 99/32660 for methods of producing probes for a given gene or genes. In 
addition, any available software may be used to produce specific probe sequences, including, 
for instance, software available from Molecular Biology Insights, Olympus Optical Co. and 
Biosoft International. In a preferred embodiment, the array will also include one or more 
control probes. 

[0116] High density array chips of the invention include "test probes." Test probes may be 
oligonucleotides that range from about 5 to about 500, or about 7 to about 50 nucleotides, 
more preferably from about 10 to about 40 nucleotides and most preferably from about 15 to 
about 35 nucleotides in length. In other particularly preferred embodiments, the probes are 
20 or 25 nucleotides in length. In another preferred embodiment, test probes are double or 
single strand DNA sequences. DNA sequences are isolated or cloned from natural sources or 
amplified from natural sources using native nucleic acid as templates. These probes have 
sequences complementary to particular subsequences of the genes whose expression they are 
designed to detect. Thus, the test probes are capable of specifically hybridizing to the target 
nucleic acid they are to detect. 

[01 17] In addition to test probes that bind the target nucleic acid(s) of interest, the high 
density array can contain a number of control probes. The control probes may fall into three 
categories referred to herein as 1) normalization controls; 2) expression level controls; and 3) 
mismatch controls. 
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[0118] Normalization controls are oligonucleotide or other nucleic acid probes that are 
complementary to labeled reference oligonucleotides or other nucleic acid sequences that are 
added to the nucleic acid sample to be screened. The signals obtained from the normalization 
controls after hybridization provide a control for variations in hybridization conditions, label 
intensity, "reading" efficiency and other factors that may cause the signal of a perfect 
hybridization to vary between arrays. In a preferred embodiment, signals (e.g., fluorescence 
intensity) read from all other probes in the array are divided by the signal (e.g., fluorescence 
intensity) from the control probes thereby normalizing the measurements. 
[0119] Virtually any probe may serve as a normalization control. However, it is recognized 
that hybridization efficiency varies with base composition and probe length. Preferred 
normalization probes are selected to reflect the average length of the other probes present in 
the array, however, they can be selected to cover a range of lengths. The normalization 
control(s) can also be selected to reflect the (average) base composition of the other probes in 
the array, however in a preferred embodiment, only one or a few probes are used and they are 
selected such that they hybridize well (i.e., no secondary structure) and do not match any 
target-specific probes. 

[0120] Expression level controls are probes that hybridize specifically with constitutively 
expressed genes in the biological sample. Virtually any constitutively expressed gene 
provides a suitable target for expression level controls. Typically expression level control 
probes have sequences complementary to subsequences of constitutively expressed 
"housekeeping genes" including, but not limited to the actin gene, the transferrin receptor 
gene, the GAPDH gene, and the like. 

[0121] Mismatch controls may also be provided for the probes to the target genes, for 
expression level controls or for normalization controls. Mismatch controls are 
oligonucleotide probes or other nucleic acid probes identical to their corresponding test or 
control probes except for the presence of one or more mismatched bases. A mismatched 
base is a base selected so that it is not complementary to the corresponding base in the target 
sequence to which the probe would otherwise specifically hybridize. One or more 
mismatches are selected such that under appropriate hybridization conditions (e.g., stringent 
conditions) the test or control probe would be expected to hybridize with its target sequence, 
but the mismatch probe would not hybridize (or would hybridize to a significantly lesser 
extent) Preferred mismatch probes contain a central mismatch. Thus, for example, where a 
probe is a 20 mer, a corresponding mismatch probe will have the identical sequence except 
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for a single base mismatch (e.g., substituting a G, a C or a T for an A) at any of positions 6 
through 14 (the central mismatch). 

[0122] Mismatch probes thus provide a control for non-specific binding or cross 
hybridization to a nucleic acid in the sample other than the target to which the probe is 
directed. For example, if the target is present the perfect match probes should be consistently 
brighter than the mismatch probes. In addition, if all central mismatches are present, the 
mismatch probes can be used to detect a mutation, for instance, a mutation of a gene in the 
accompanying Tables 1-3 . The difference in intensity between the perfect match and the 
mismatch probe provides a good measure of the concentration of the hybridized material. 

Nucleic Acid Samples 

[0123] Cell or tissue samples may be exposed to the test agent in vitro or in vivo. When 
cultured cells or tissues are used, appropriate mammalian liver extracts may also be added 
with the test agent to evaluate agents that may require biotransformation to exhibit toxicity. 
In a preferred format, primary isolates of animal or human hepatocytes which already express 
the appropriate complement of drug-metabolizing enzymes may be exposed to the test agent 
without the addition of mammalian liver extracts. 

[0124] The genes which are assayed according to the present invention are typically in the 
form of mRNA or reverse transcribed mRNA. The genes may be cloned or not. The genes 
may be amplified or not. The cloning and/or amplification do not appear to bias the 
representation of genes within a population. In some assays, it may be preferable, however, 
to use polyA+ RNA as a source, as it can be used with less processing steps. 
[0125] As is apparent to one of ordinary skill in the art, nucleic acid samples used in the 
methods and assays of the invention may be prepared by any available method or process. 
Methods of isolating total mRNA are well known to those of skill in the art. For example, 
methods of isolation and purification of nucleic acids are described in detail in Chapter 3 of 
Laboratory Techniques in Biochemistry and Molecular Biology, Vol. 24 . Hybridization With 
Nucleic Acid Probes: Theory and Nucleic Acid Probes, P. Tijssen, Ed., Elsevier Press, New 
York, 1993. Such samples include RNA samples, but also include cDNA synthesized from a 
mRNA sample isolated from a cell or tissue of interest. Such samples also include DNA 
amplified from the cDNA, and RNA transcribed from the amplified DNA. One of skill in the 
art would appreciate that it is desirable to inhibit or destroy RNase present in homogenates 
before homogenates are used. 

[0126] Biological samples may be of any biological tissue or fluid or cells from any 
organism as well as cells raised in vitro, such as cell lines and tissue culture cells. Frequently 
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the sample will be a tissue or cell sample that has been exposed to a compound, agent, drug, 
pharmaceutical composition, potential environmental pollutant or other composition. In 
some formats, the sample will be a "clinical sample" which is a sample derived from a 
patient. Typical clinical samples include, but are not limited to, sputum, blood, blood-cells 
(e.g., white cells), tissue or fine needle biopsy samples, urine, peritoneal fluid, and pleural 
fluid, or cells therefrom. 

[0127] Biological samples may also include sections of tissues, such as frozen sections or 
formalin fixed sections taken for histological purposes. 

Forming High Density Arrays 

[0128] Methods of forming high density arrays of oligonucleotides with a minimal number 
of synthetic steps are known. The oligonucleotide analogue array can be synthesized on a 
single or on multiple solid substrates by a variety of methods, including, but not limited to, 
light-directed chemical coupling, and mechanically directed coupling (see Pirrung, U.S. 
Patent No. 5,143,854). 

[0129] In brief, the light-directed combinatorial synthesis of oligonucleotide arrays on a 
glass surface proceeds using automated phosphoramidite chemistry and chip masking 
techniques. In one specific implementation, a glass surface is derivatized with a silane 
reagent containing a functional group, e.g. , a hydroxyl or amine group blocked by a 
photolabile protecting group. Photolysis through a photolithogaphic mask is used selectively 
to expose functional groups which are then ready to react with incoming 5' photoprotected 
nucleoside phosphoramidites. The phosphoramidites react only with those sites which are 
illuminated (and thus exposed by removal of the photolabile blocking group). Thus, the 
phosphoramidites only add to those areas selectively exposed from the preceding step. These 
steps are repeated until the desired array of sequences have been synthesized on the solid 
surface. Combinatorial synthesis of different oligonucleotide analogues at different locations 
on the array is determined by the pattern of illumination during synthesis and the order of 
addition of coupling reagents. 

[0130] In addition to the foregoing, additional methods which can be used to generate an 
array of oligonucleotides on a single substrate are described in PCT Publication Nos. WO 
93/09668 and WO 01/23614. High density nucleic acid arrays can also be fabricated by 
depositing pre-made or natural nucleic acids in predetermined positions. Synthesized or 
natural nucleic acids are deposited on specific locations of a substrate by light directed 
targeting and oligonucleotide directed targeting. Another embodiment uses a dispenser that 
moves from region to region to deposit nucleic acids in specific spots. 
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Hybridization 

[0131] Nucleic acid hybridization simply involves contacting a probe and target nucleic 
acid under conditions where the probe and its complementary target can form stable hybrid 
duplexes through complementary base pairing. See WO 99/32660. The nucleic acids that do 
not form hybrid duplexes are then washed away leaving the hybridized nucleic acids to be 
detected, typically through detection of an attached detectable label. It is generally 
recognized that nucleic acids are denatured by increasing the temperature or decreasing the 
salt concentration of the buffer containing the nucleic acids. Under low stringency conditions 
(e.g., low temperature and/or high salt) hybrid duplexes (e.g., DNA:DNA, RNA:RNA, or 
RNA:DNA) will form even where the annealed sequences are not perfectly complementary. 
Thus, specificity of hybridization is reduced at lower stringency. Conversely, at higher 
stringency (e.g., higher temperature or lower salt) successful hybridization tolerates fewer 
mismatches. One of skill in the art will appreciate that hybridization conditions may be 
selected to provide any degree of stringency. 

[0132] In a preferred embodiment, hybridization is performed at low stringency, in this case 
in 6X SSPET at 37°C (0.005% Triton X-100), to ensure hybridization and then subsequent 
washes are performed at higher stringency (e.g., I X SSPET at 37°C) to eliminate mismatched 
hybrid duplexes. Successive washes may be performed at increasingly higher stringency 
(e.g., down to as low as 0.25 X SSPET at 37°C to 50°C) until a desired level of hybridization 
specificity is obtained. Stringency can also be increased by addition of agents such as 
formamide. Hybridization specificity may be evaluated by comparison of hybridization to 
the test probes with hybridization to the various controls that can be present (e.g., expression 
level control, normalization control, mismatch controls, etc.). 

[0133] In general, there is a tradeoff between hybridization specificity (stringency) and 
signal intensity. Thus, in a preferred embodiment, the wash is performed at the highest 
stringency that produces consistent results and that provides a signal intensity greater than 
approximately 10% of the background intensity. Thus, in a preferred embodiment, the 
hybridized array may be washed at successively higher stringency solutions and read between 
each wash. Analysis of the data sets thus produced will reveal a wash stringency above 
which the hybridization pattern is not appreciably altered and which provides adequate signal 
for the particular oligonucleotide probes of interest. 
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Signal Detection 

[0134] The hybridized nucleic acids are typically detected by detecting one or more labels 
attached to the sample nucleic acids. The labels may be incorporated by any of a number of 
means well known to those of skill in the art. See WO 99/32660. 

Databases 

[0135] The present invention includes relational databases containing sequence 
information, for instance, for the genes of Tables 1-3, as well as gene expression information 
from tissue or cells exposed to various standard toxins, such as those herein described (see 
Tables 3A-3DD). Databases may also contain information associated with a given sequence 
or tissue sample such as descriptive information about the gene associated with the sequence 
information (see Table 1), or descriptive information concerning the clinical status of the 
tissue sample, or the animal from which the sample was derived. The database may be 
designed to include different parts, for instance a sequence database and a gene expression 
database. Methods for the configuration and construction of such databases and computer- 
readable media to which such databases are saved are widely available, for instance, see U.S. 
Patent No. 5,953,727, which is herein incorporated by reference in its entirety. 
[0136] The databases of the invention may be linked to an outside or external database such 
as GenBank {www.ncbi.nlm.nih.gov/entrez.index.html); KEGG (www.genome.ad.jp/kegg); 
SPAD (www.grt.kyushu-u.ac.jp/spad/index.html); HUGO (www.gene.ucl.ac.uk/hugo); Swiss- 
Prot (www.expasy.ch.sprot); Prosite (www.expasy.ch/tools/scnpsitl.html); OMIM 
(www.ncbi.nlm.nih.gov/omim); GDB (www.gdb.org); and GeneCard 

(bioinformatics.weizmann.ac.il/cards). In a preferred embodiment, as described in Tables 1- 
3, the external database is GenBank and the associated databases maintained by the National 
Center for Biotechnology Information (NCBI) (www.ncbi.nlm.nih.gov). 
[0137] Any appropriate computer platform, user interface, etc. may be used to perform the 
necessary comparisons between sequence information, gene expression information and any 
other information in the database or information provided as an input. For example, a large 
number of computer workstations are available from a variety of manufacturers, such has 
those available from Silicon Graphics. Client/server environments, database servers and 
networks are also widely available and appropriate platforms for the databases of the 
invention. 

[0138] The databases of the invention may be used to produce, among other things, 
electronic Northerns that allow the user to determine the cell type or tissue in which a given 
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gene is expressed and to allow determination of the abundance or expression level of a given 
gene in a particular tissue or cell. 

[0139] The databases of the invention may also be used to present information identifying 
the expression level in a tissue or cell of a set of genes comprising one or more of the genes 
in Tables 1-3, comprising the step of comparing the expression level of at least one gene in 
Tables 1-3 in a cell or tissue exposed to a test agent to the level of expression of the gene in 
the database. Such methods may be used to predict the toxic potential of a given compound 
by comparing the level of expression of a gene or genes in Tables 1-3 from a tissue or cell 
sample exposed to the test agent to the expression levels found in a control tissue or cell 
samples exposed to a standard toxin or hepatotoxin such as those herein described. Such 
methods may also be used in the drug or agent screening assays as described herein. 
[0140] 

Kits 

[0141] The invention further includes kits combining, in different combinations, high- 
density oligonucleotide arrays, reagents for use with the arrays, protein reagents encoded by 
the genes of the Tables, signal detection and array-processing instruments, gene expression 
databases and analysis and database management software described above. The kits may be 
used, for example, to predict or model the toxic response of a test compound, to monitor the 
progression of hepatic disease states, to identify genes that show promise as new drug targets 
and to screen known and newly designed drugs as discussed above. 

[0142] The databases packaged with the kits are a compilation of expression patterns from 
human or laboratory animal genes and gene fragments (corresponding to the genes of Tables 
1 -3). In particular, the database software and packaged information that may contain the 
databases saved to a computer-readable medium include the expression results of Tables 1-3 
that can be used to predict toxicity of a test agent by comparing the expression levels of the 
genes of Tables 1-3 induced by the test agent to the expression levels presented in Tables 3A- 
3DD. In another format, database and software information may be provided in a remote 
electronic format, such as a website, the address of which may be packaged in the kit. 
[0143] The kits may used in the pharmaceutical industry, where the need for early drug 
testing is strong due to the high costs associated with drug development, but where 
bioinformatics, in particular gene expression informatics, is still lacking. These kits will 
reduce the costs, time and risks associated with traditional new drug screening using cell 
cultures and laboratory animals. The results of large-scale drug screening of pre-grouped 
patient populations, pharmacogenomics testing, can also be applied to select drugs with 
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greater efficacy and fewer side-effects. The kits may also be used by smaller biotechnology 
companies and research institutes who do not have the facilities for performing such large- 
scale testing themselves. 

[0144] Databases and software designed for use with use with microarrays is discussed in 
Balaban et al, U.S. Patent Nos. 6,229,91 1, a computer-implemented method for managing 
information, stored as indexed Tables 1-3 , collected from small or large numbers of 
microarrays, and 6,185,561, a computer-based method with data mining capability for 
collecting gene expression level data, adding additional attributes and reformatting the data to 
produce answers to various queries. Chee et al, U.S. Patent No. 5,974,164, discloses a 
software-based method for identifying mutations in a nucleic acid sequence based on 
differences in probe fluorescence intensities between wild type and mutant sequences that 
hybridize to reference sequences. 

[0145] Without further description, it is believed that one of ordinary skill in the art can, 
using the preceding description and the following illustrative examples, make and utilize the 
compounds of the present invention and practice the claimed methods. The following 
working examples therefore, specifically point out the preferred embodiments of the present 
invention, and are not to be construed as limiting in any way the remainder of the disclosure. 

EXAMPLES 

Example 1: Identification of Toxicity Markers 

[0146] The hepatotoxins acyclovir, amitryptiline, alpha-naphthylisothiocyante (ANIT), 
acetaminophen, AY-25329, bicalutamide, carbon tetrachloride, clofibrate, cyproterone 
acetate (CPA), diclofenac, diflunisal, dioxin, 17a-ethinylestradiol, hydrazine, indomethacin, 
lipopolysaccharide, phenobarbital, tacrine, valproate, WY- 14643, zileuton and control 
compositions were administered to male Sprague-Dawley rats at various time points using 
administration diluents, protocols and dosing regimes as previously described in the art and 
previously described in the priority applications discussed above. 

[0147] After adminstration, the dosed animals were observed and tissues were collected as 
described below: 



OBSERVATION OF ANIMALS 

[0148] 1. Clinical Observations- Twice daily: mortality and moribundity check. 
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[0149] Cage Side Observations - skin and fur, eyes and mucous membrane, respiratory 
system, circulatory system, autonomic and central nervous system, somatomotor pattern, and 
behavior pattern. 

[0150] Potential signs of toxicity, including tremors, convulsions, salivation, diarrhea, 
lethargy, coma or other atypical behavior or appearance, were recorded as they occurred and 
included a time of onset, degree, and duration. 

[0151] 2. Physical Examinations- Prior to randomization, prior to initial treatment, and 
prior to sacrifice. 

[0152] 3. Body Weights- Prior to randomization, prior to initial treatment, and prior to 
sacrifice. 

CLINICAL PATHOLOGY 

[0153] 1. Frequency Prior to necropsy. 

[0154] 2. Number of animals All surviving animals. 

[01 55] 3 . Bleeding Procedure Blood was obtained by puncture of the orbital 

sinus while under 70% C0 2 / 30% 0 2 anesthesia. 

[0156] 4. Collection of Blood Samples Approximately 0.5 mL of blood was collected 
into EDTA tubes for evaluation of hematology parameters. Approximately 1 mL of blood 
was collected into serum separator tubes for clinical chemistry analysis. Approximately 200 
uL of plasma was obtained and frozen at ~-80°C for test compound/metabolite estimation. 
An additional ~2 mL of blood was collected into a 15 mL conical polypropylene vial to 
which ~3 mL of Trizol was immediately added. The contents were immediately mixed with 
a vortex and by repeated inversion. The tubes were frozen in liquid nitrogen and stored at — 
80°C. 

TERMINATION PROCEDURES 
Terminal Sacrifice 

[0157] Approximately 1 and 3 and 6 and 24 and 48 hours and 5-7 days after the initial dose, 
rats were weighed, physically examined, sacrificed by decapitation, and exsanguinated. The 
animals were necropsied within approximately five minutes of sacrifice. Separate sterile, 
disposable instruments were used for each animal, with the exception of bone cutters, which 
were used to open the skull cap. The bone cutters were dipped in disinfectant solution 
between animals. 

[0158] Necropsies were conducted on each animal following procedures approved by 
board-certified pathologists. 
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[0159] Animals not surviving until terminal sacrifice were discarded without necropsy 
(following euthanasia by carbon dioxide asphyxiation, if moribund). The approximate time 
of death for moribund or found dead animals was recorded. 

Postmortem Procedures 

[0160] Fresh and sterile disposable instruments were used to collect tissues. Gloves were 
worn at all times when handling tissues or vials. All tissues were collected and frozen within 
approximately 5 minutes of the animal's death. The liver sections and kidneys were frozen 
within approximately 3-5 minutes of the animal's death. The time of euthanasia, an interim 
time point at freezing of liver sections and kidneys, and time at completion of necropsy were 
recorded. Tissues were stored at approximately -80°C or preserved in 10% neutral buffered 
formalin. 

Tissue Collection and Processing 

Liver 

[0161] 1 . Right medial lobe - snap frozen in liquid nitrogen and stored at ~-80°C. 

[0162] 2. Left medial lobe - Preserved in 10% neutral-buffered formalin (NBF) and 

[0163] evaluated for gross and microscopic pathology. 

[0164] 3. Left lateral lobe - snap frozen in liquid nitrogen and stored at ~-80°C. 

Heart 

[0165] A sagittal cross-section containing portions of the two atria and of the two 

ventricles was preserved in 10% NBF. The remaining heart was frozen in liquid nitrogen 
and stored at ~-80°C. 

Kidneys (both) 

[0166] 1 . Left - Hemi-dissected; half was preserved in 1 0% NBF and the remaining 

half was frozen in liquid nitrogen and stored at ~ -80°C. 

[0167] 2. Right - Hemi-dissected; half was preserved in 10% NBF and the remaining 

half was frozen in liquid nitrogen and stored at ~ -80°C. 

Testes (both) 

[0168] A sagittal cross-section of each testis was preserved in 10% NBF. The 

remaining testes were frozen together in liquid nitrogen and stored at ~-80°C. 
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Brain (whole) 

[01 69] A cross-section of the cerebral hemispheres and of the diencephalon was 

preserved in 10% NBF, and the rest of the brain was frozen in liquid nitrogen and stored at ~ 
-80°C. 

[0170] Microarray sample preparation was conducted with minor modifications, following 
the protocols set forth in the Affymetrix GeneChip Expression Analysis Manual. Frozen 
tissue was ground to a powder using a Spex Certiprep 6800 Freezer Mill. Total RNA was 
extracted with Trizol (GibcoBRL) utilizing the manufacturer's protocol. The total RNA yield 
for each sample was 200-500 jig per 300 mg tissue weight. mRNA was isolated using the 
Oligotex mRNA Midi kit (Qiagen) followed by ethanol precipitation. Double stranded 
cDNA was generated from mRNA using the Superscript Choice system (GibcoBRL). First 
strand cDNA synthesis was primed with a T7-(dT24) oligonucleotide. The cDNA was 
phenol-chloroform extracted and ethanol precipitated to a final concentration of 1 ug/ml. 
From 2 jixg of cDNA, cRNA was synthesized using Ambion's T7 MegaScript in vitro 
Transcription Kit. 

[0171] To biotin label the cRNA, nucleotides Bio-1 1-CTP and Bio-16-UTP (Enzo 
Diagnostics) were added to the reaction. Following a 37°C incubation for six hours, 
impurities were removed from the labeled cRNA following the RNeasy Mini kit protocol 
(Qiagen). cRNA was fragmented (fragmentation buffer consisting of 200 mM Tris-acetate, 
pH 8.1, 500 mM KOAc, 150 mM MgOAc) for thirty-five minutes at 94°C. Following the 
Affymetrix protocol, 55 u.g of fragmented cRNA was hybridized on the Affymetrix rat array 
set for twenty-four hours at 60 rpm in a 45°C hybridization oven. The chips were washed and 
stained with Streptavidin Phycoerythrin (SAPE) (Molecular Probes) in Affymetrix fluidics 
stations. To amplify staining, SAPE solution was added twice with an anti-streptavidin 
biotinylated antibody (Vector Laboratories) staining step in between. Hybridization to the 
probe arrays was detected by fluorometric scanning (Hewlett Packard Gene Array Scanner). 
Data was analyzed using Affymetrix GeneChip® version 2.0 and Expression Data Mining 
(EDMT) software (version 1.0), GeneExpress2000, and S-Plus. 

[0172] Table 1 discloses those genes that are differentially expressed upon exposure to the 
named toxins and their corresponding GenBank Accession and Sequence Identification 
numbers, the identities of the metabolic pathways in which the genes function, the gene 
names if known, and the unigene cluster titles. The comparison code represents the various 
toxicity or liver pathology state that each gene is able to discriminate as well as the individual 
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toxin type associated with each gene. The codes are defined in Table 2. The GLGC ID is the 
internal Gene Logic identification number. 
[0173] Table 2 defines the comparison codes used in Table 1. 
[0174] Tables 3A-3DD disclose the summary statistics for each of the comparisons 
performed. Each of these tables contains a set of predictive genes and creates a model for 
predicting the hepatoxicity of an unknown, i.e., untested compound. Each gene is identified 
by its Gene Logic identification number and can be cross-referenced to a gene name and 
representative SEQ ID NO. in Table 1 . For each comparison of gene expression levels 
between samples in the toxicity group (samples affected by exposure to a specific toxin) and 
samples in the non-toxicity group (samples not affected by exposure to that same specific 
toxin), the group mean (for toxicity group samples) is the mean signal intensity, as 
normalized for the various chip parameters that are being assayed. The non-group mean 
represents the mean signal intensity, as normalized for the various chip parameters that are 
being assayed, in samples from animals other than those treated with the high dose of the 
specific toxin. These animals were treated with a low dose of the specific toxin, or with 
vehicle alone, or with a different toxin. Samples in the toxicity groups were obtained from 
animals sacrificed at the timepoint(s) indicated in the tables, while samples in the non- 
toxicity groups were obtained from animals sacrificed at all time points in the experiments. 
For individual genes, an increase in the group mean compared to the non-group mean 
indicates up-regulation upon exposure to a toxin. Conversely, a decrease in the group mean 
compared to the non-group mean indicates down-regulation. 

[0175] The mean values are derived from Average Difference (AveDiff) values for a 
particular gene, averaged across the corresponding samples. Each individual Average 
Difference value is calculated by integrating the intensity information from multiple probe 
pairs that are tiled for a particular fragment. The normalization algorithm used to calculate 
the AveDiff is based on the observation that the expression intensity values from a single 
chip experiment have different distributions, depending on whether small or large expression 
values are considered. Small values, which are assumed to be mostly noise, are 
approximately normally distributed with mean zero, while larger values roughly obey a log- 
normal distribution; that is, their logarithms are normally distributed with some nonzero 
mean. 

[0176] The normalization process computes separate scale factors for "non-expressors" 
(small values) and "expressors" (large ones). The inputs to the algorithm are pre-normalized 
Average Difference values, which are already scaled to set the trimmed mean equal to 100. 
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The algorithm computes the standard deviation SD noise of the negative values, which are 
assumed to come from non-expressors. It then multiplies all negative values, as well as all 
positive values less than 2.0* SD noise, by a scale factor proportional to 1/ SD noise. 
[0177] Values greater than 2.0* SD noise are assumed to come from expressors. For these 
values, the standard deviation SD log (signal) of the logarithms is calculated. The logarithms 
are then multiplied by a scale factor proportional to 1/ SD log (signal) and exponentiated . 
The resulting values are then multiplied by another scale factor, chosen so there will be no 
discontinuity in the normalized values from unsealed values on either side of 2.0* SD noise. 
Some AveDiff values may be negative due to the general noise involved in nucleic acid 
hybridization experiments. Although many conclusions can be made corresponding to a 
negative value on the GeneChip platform, it is difficult to assess the meaning behind the 
negative value for individual fragments. Our observations show that, although negative 
values are observed at times within the predictive gene set, these values reflect a real 
biological phenomenon that is highly reproducible across all the samples from which the 
measurement was taken. For this reason, those genes that exhibit a negative value are 
included in the predictive set. It should be noted that other platforms of gene expression 
measurement may be able to resolve the negative numbers for the corresponding genes. The 
predictive ability of each of those genes should extend across platforms, however. Each 
mean value is accompanied by the standard deviation for the mean. The linear discriminant 
analysis score (discriminant score), as disclosed in the tables, measures the ability of each 
gene to predict whether or not a sample is toxic. The discriminant score is calculated by the 
following steps: 

Calculation of a discriminant score 

[0178] Let Xj represent the AveDiff values for a given gene across the Group 1 samples, 
i=l...n. 

[0179] Let Yj represent the AveDiff values for a given gene across the Group 2 samples, 
i=l...t. 

[0180] The calculations proceed as follows: 

[0181] Calculate mean and standard deviation for X's and Y,'s, and denote these by m x , 
m Y , s x ,s Y . 

[0182] For all Xj's and Y,'s, evaluate the function f(z) = ((l/s Y )*exp( -.5*( (z-m Y )/s Y ) 2 )) / 
(((l/s Y )*exp( -.5*( (z-m Y )/s Y ) 2 )) +((l/s x )*exp( -.5*( (z-m x )/s x ) 2 ))). 
[0183] The number of correct predictions, say P, is then the number of Y,'s such that 
f(Y)>.5 plus the number of X;'s such that f(X ; )<.5. 
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[0184] The discriminant score is then P/(n+t). 

[0185] Linear discriminant analysis uses both the individual measurements of each gene 
and the calculated measurements of all combinations of genes to classify samples. For each 
gene a weight is derived from the mean and standard deviation of the tox and nontox groups. 
Every gene is multiplied by a weight and the sum of these values results in a collective 
discriminate score. This discriminant score is then compared against collective centroids of 
the tox and nontox groups. These centroids are the average of all tox and nontox samples 
respectively. Therefore, each gene contributes to the overall prediction. This contribution is 
dependent on weights that are large positive or negative numbers if the relative distances 
between the tox and nontox samples for that gene are large and small numbers if the relative 
distances are small. The discriminant score for each unknown sample and centroid values 
can be used to calculate a probability between zero and one as to the group in which the 
unknown sample belongs. 

Example 2: General Toxicity Modeling 

[0186] Samples were selected for grouping into tox-responding and non-tox-responding 
groups by examining each study individually with Principal Components Analysis (PCA) to 
determine which treatments had an observable response. Only groups where confidence of 
their tox-responding and non-tox-responding status was established were included in building 
a general tox model. 

[0187] Linear discriminant models were generated to describe toxic and non-toxic samples. 
The top discriminant genes and/or EST's were used to determine toxicity by calculating each 
gene's contribution with homo and heteroscedastic treatment of variance and inclusion or 
exclusion of mutual information between genes. Prediction of samples within the database 
exceeded 80% true positives with a false positive rate of less than 5%. It was determined that 
combinations of genes and/or EST's generally provided a better predictive ability than 
individual genes and that the more genes and/or EST used the better predictive ability. 
Although the preferred embodiment includes fifty or more genes, many pairings or greater 
combinations of genes and/or EST can work better than individual genes. All combinations 
of two or more genes from the selected list could be used to predict toxicity. These 
combinations could be selected by pairing in an agglomerate, divisive, or random approach. 
Further, as yet undetermined genes and/or EST's could be combined with individual or 
combination of genes and/or EST's described here to increase predictive ability. However, 
the genes and/or EST's described here would contribute most of the predictive ability of any 
such undetermined combinations. 
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[0188] Other variations on the above method can provide adequate predictive ability. 
These include selective inclusion of components via agglomerate, divisive, or random 
approaches or extraction of loading and combining them in agglomerate, divisive, or random 
approaches. Also the use of composite variables in logistic regression to determine 
classification of samples can also be accomplished with linear discriminate analysis, neural or 
Bayesian networks, or other forms of regression and classification based on categorical or 
continual dependent and independent variables. 

Example 3: Modeling Methods 

[0189] The above modeling methods provide broad approaches of combining the 
expression of genes to predict sample toxicity. One could also provide no weight in a simple 
voting method or determine weights in a supervised or unsupervised method using 
agglomerate, divisive, or random approaches. All or selected combinations of genes may be 
combined in ordered, agglomerate, or divisive, supervised or unsupervised clustering 
algorithms with unknown samples for classification. Any form of correlation matrix may 
also be used to classify unknown samples. The spread of the group distribution and 
discriminate score alone provide enough information to enable a skilled person to generate all 
of the above types of models with accuracy that can exceed discriminate ability of individual 
genes. Some examples of methods that could be used individually or in combination after 
transformation of data types include but are not limited to: Discriminant Analysis, Multiple 
Discriminant Analysis, logistic regression, multiple regression analysis, linear regression 
analysis, conjoint analysis, canonical correlation, hierarchical cluster analysis, k-means 
cluster analysis, self-organizing maps, multidimensional scaling, structural equation 
modeling, support vector machine determined boundaries, factor analysis, neural networks, 
bayesian classifications, and resampling methods. 

Example 4: Grouping of Individual compound and Pathology Classes 
[0190] Samples were grouped into individual pathology classes based on known 
toxicological responses and observed clinical chemical and pathology measurements or into 
early and late phases of observable toxicity within a compound (Tables 3A-3DD). The top 
10, 25, 50, 100 genes based on individual discriminate scores were used in a model to ensure 
that combination of genes provided a better prediction than individual genes. As described 
above, all combinations of two or more genes from this list could potentially provide better 
prediction than individual genes when selected in any order or by ordered, agglomerate, 
divisive, or random approaches. In addition, combining these genes with other genes could 
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provide better predictive ability, but most of this predictive ability would come from the 
genes listed herein. 

[0191] Samples may be considered toxic if they score positive in any pathological or 
individual compound class represented here or in any modeling method mentioned under 
general toxicology models based on combination of individual time and dose grouping of 
individual toxic compounds obtainable from the data. The pathological groupings and early 
and late phase models are preferred examples of all obtainable combinations of sample time 
and dose points. Most logical groupings with one or more genes and one or more sample 
dose and time points should produce better predictions of general toxicity, pathological 
specific toxicity, or similarity to known toxicant than individual genes. 

[0192] Although the present invention has been described in detail with reference to 
examples above, it is understood that various modifications can be made without departing 
from the spirit of the invention. Accordingly, the invention is limited only by the following 
claims. All cited patents, patent applications and publications referred to in this application 
are herein incorporated by reference in their entirety. 



